Comment Extraction from Blog Posts and Its Applications to Opinion Mining

نویسندگان

  • Huan-An Kao
  • Hsin-Hsi Chen
چکیده

Blog posts containing many personal experiences or perspectives toward specific subjects are useful. Blogs allow readers to interact with bloggers by placing comments on specific blog posts. The comments carry viewpoints of readers toward the targets described in the post, or supportive/non-supportive attitude toward the post. Comment extraction is challenging due to that there does not exist a unique template among all blog service providers. This paper proposes methods to deal with this problem. Firstly, the repetitive patterns and their corresponding blocks are extracted from input posts by pattern identification algorithm. Secondly, three filtering strategies, i.e., tag pattern loop filtering, rule overlap filtering, and longest rule first, are used to remove non-comment blocks. Finally, a comment/non-comment classifier is learned to distinguish comment blocks from non-comment blocks with 14 block-level features and 5 rule-level features. In the experiments, we randomly select 600 blog posts from 12 blog service providers. F-measure, recall, and precision are 0.801, 0.855, and 0.780, respectively, by using all of the three filtering strategies together with some selected features. The application of comment extraction to blog mining is also illustrated. We show how to identify the relevant opinionated objects – say, opinion holders, opinions, and targets, from posts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Polarity Detection in Blog Comments from Blog Rss Feed by Modified TF - IDF Algorithm

412 | P a g e ABSTRACT Blogs are most common medium over web where user posts their opinion. It is considered to be a web space of the users where they share their views, beliefs and other philosophy. Blogs posted across the web can be extracted from their rss feed. Once a blog is posted, several readers leaves their comment on the blogs. Analyzing these comments can help in finding the opinion...

متن کامل

Blog Post Extraction Using Title Finding

Linhai Song , Xueqi Cheng, Yan Guo, Bo Wu , Yu Wang 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School of the Chinese Academy of Sciences, Beijing Abstract: With the development of Web2.0, web mining applications pay more attention to blog pages. In order to prevent noises in blog pages from affecting the precision of web mining algorithms, it is very ...

متن کامل

Managing Risk and Enhancing Discoverability of Opinion From Online Reviews Using Classification Algorithm

A million number of reviews and opinions about any aspect are being posted in numerous blogs, forums, and online sites. This enormous information on worldwide network platforms make them feasible and can be used as source, in applications based on opinion mining and review analysis. The aim of this paper is to discover opinions from online reviews and managing risk in future. Our proposed metho...

متن کامل

Extracting Aspect-Evaluation and Aspect-Of Relations in Opinion Mining

The technology of opinion extraction allows users to retrieve and analyze people’s opinions scattered over Web documents. We define an opinion unit as a quadruple consisting of the opinion holder, the subject being evaluated, the part or the attribute in which the subject is evaluated, and the value of the evaluation that expresses a positive or negative assessment. We use this definition as th...

متن کامل

Applying Opinion Mining to OSS selection process

Mining opinions in free text so as to separate the opinionated and the relevant content in text is a challenging research problem. People and organizations that are considering the adoption of Open-Source Software (OSS), or that need to choose among different OSS products are interested in knowing the user community’s opinion, since this can provide useful indications about the strengths and li...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010